Info-gibbs: a Motif Discovery Algorithm That Directly Optimizes Information Content during Sampling

نویسندگان

  • Matthieu Defrance
  • Jacques van Helden
چکیده

MOTIVATION Discovering cis-regulatory elements in genome sequence remains a challenging issue. Several methods rely on the optimization of some target scoring function. The information content (IC) or relative entropy of the motif has proven to be a good estimator of transcription factor DNA binding affinity. However, these information-based metrics are usually used as a posteriori statistics rather than during the motif search process itself. RESULTS We introduce here info-gibbs, a Gibbs sampling algorithm that efficiently optimizes the IC or the log-likelihood ratio (LLR) of the motif while keeping computation time low. The method compares well with existing methods like MEME, BioProspector, Gibbs or GAME on both synthetic and biological datasets. Our study shows that motif discovery techniques can be enhanced by directly focusing the search on the motif IC or the motif LLR. AVAILABILITY http://rsat.ulb.ac.be/rsat/info-gibbs

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ARCS-Motif: discovering correlated motifs from unaligned biological sequences

MOTIVATION The goal of motif discovery is to detect novel, unknown, and important signals from biology sequences. In most models, the importance of a motif is equal to the sum of the similarity of every single position. In 2006, Song et al. introduced Aggregated Related Column Score (ARCS) measure which includes correlation information to the evaluation of motif importance. The paper showed tha...

متن کامل

A Fast, Alignment-Free, Conservation-Based Method for Transcription Factor Binding Site Discovery

As an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences are becoming more prevalent. Most comparative methods for transcription factor (TF) binding site discovery make use of global or local alignments of orthologous regulatory regions to assess whether a particular DNA site is conserved across related...

متن کامل

A Combined Model and a Varied Gibbs Sampling Algorithm Used for Motif Discovery

The conserved sequences in gene regulatory regions dominate gene regulation. Discovering these sequences and their functions is important in post genome era. A novel model is constructed to represent conserved motifs of DNA sequences. This model is a combination of PWM and WAM models. The advantage is the new model not only can comprise individual base frequencies in the motifs, but also can em...

متن کامل

An improved Gibbs sampling method for motif discovery via sequence weighting.

The discovery of motifs in DNA sequences remains a fundamental and challenging problem in computational molecular biology and regulatory genomics, although a large number of computational methods have been proposed in the past decade. Among these methods, the Gibbs sampling strategy has shown great promise and is routinely used for finding regulatory motif elements in the promoter regions of co...

متن کامل

A Comparison Of Expectation Maximization and Gibbs Sampling Strategies for Motif Finding

A set of protein or nucleotide sequences may be found to share patterns reflecting biological structure, function and change. The task of identifying these patterns, known as motif finding, can be viewed as an instance of multiple sequence alignment. While it is possible to identify motifs using x-ray and magnetic resonance structures, biologists and computer scientists have developed several a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 25 20  شماره 

صفحات  -

تاریخ انتشار 2009